Link to repo: https://github.com/sarstanc/DV_FinalProject
Utility rates by zipcode: https://catalog.data.gov/dataset/u-s-electric-utility-companies-and-rates-look-up-by-zipcode-feb-2011-57a7c
Unemployment rates by zipcode: http://blog.splitwise.com/2014/01/06/free-us-population-density-and-unemployment-rate-by-zip-code/
Population by zipcode: http://blog.splitwise.com/2013/09/18/the-2010-us-census-population-by-zip-code-totally-free/
We truncated the data to zipcodes in Texas, Oklahoma, and Louisiana (zipcodes between 70000-80000) to limit the focus and decrease the size of the file.
Select the csv file to import
Check that left and right enclosures do not conflict with data
Name the table
Choose all columns that go into the Oracle table
Change column names to remove spaces and starting numbers
Load each data set one at a time to blend rather than join
Edit relationships between data sets to link common columns
Existing relationships will appear here. Select “custom” to change.
Use Population by zipcode as the primary dataset and Unemployment and Utilities as the secondary datasets.
Orange indicates the secondary data sources
Resrate, Indrate, and Comrate are all dimensions from the Utility Rates data source. Since these dimensions contain string values, they must individually be converted to integers in order to calculate an average.
Crosstabs with utility company name and unemployment rate by zip code Generate longitude and latitude to show residential utility rates compared to population by zip code
Filter out null residential utility costs
Bar graph shows average utility cost broken down into residential, commercial, and industrial costs per zip code
Filter out null and 0 value utility costs
library("rjson")
library("RCurl")
## Loading required package: bitops
require(tidyr)
## Loading required package: tidyr
require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
source("../04_R/unemployment_rank.R")
3 part join
source("../04_R/avgrate.R")
## Loading required package: ggplot2
Multiplied by -1 to workaround for calculated field